fix(graph): introduce Status.SKIPPED for cancel_node; graph continues downstream by Zelys-DFKH · Pull Request #2258 · strands-agents/harness-sdk

Zelys-DFKH · 2026-05-06T22:21:32Z

Description

When cancel_node is set in a BeforeNodeCallEvent hook, _execute_node raised RuntimeError after emitting MultiAgentNodeCancelEvent, which propagated through _execute_nodes_parallel and killed the graph. Downstream nodes never ran, the overall status became FAILED, and interrupt-and-resume workflows had no recovery path.

The fix introduces Status.SKIPPED. The bypassed node records result=None and status=Status.SKIPPED, lands in completed_nodes so downstream readiness checks are satisfied, and returns cleanly. Using Status.COMPLETED here would be wrong: a node that never ran is not completed, and conflating the two makes it impossible to distinguish "ran successfully" from "intentionally bypassed" in loop control and observability tools.

Three fixes came alongside:

_build_node_input filters SKIPPED deps before building dependency_results, so when all upstream deps are skipped the downstream node receives the original task with no orphaned "From node_x:" header
_from_dict reads the persisted NodeResult.status to restore Status.SKIPPED correctly on resume, rather than hardcoding Status.COMPLETED for everything in completed_nodes
GraphResult gains skipped_nodes: int; completed_nodes now counts only nodes that actually ran

Also opened #2401 for the complementary CANCELLED semantic (abort-branch, downstream does not run). That issue raises the cancel_node → skip_node naming question; if maintainers prefer Option A, the rename can happen in this PR before it merges.

Related Issues

Resolves #2240

Documentation PR

N/A

Type of Change

Bug fix

Testing

Ran hatch run prepare. Updated test_graph_cancel_node (parametrized ×2) to assert Status.SKIPPED on both NodeResult.status and node.execution_status. Updated test_graph_cancel_node_downstream_executes to assert: downstream node executes (step_b.stream_async.assert_called_once()), step_b receives only the original task with no orphaned header, graph_result.skipped_nodes == 1, and graph_result.completed_nodes == 1.

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

yananym · 2026-05-29T17:10:19Z

Thanks for working on this @Zelys-DFKH — this addresses a real pain point. I filed the original issue (#2240) and have been thinking about the design here, especially considering future graph looping support (revisiting previous nodes via edge conditions).

Proposal: Introduce two distinct statuses — `SKIPPED` and `CANCELLED`

After looking at how the TypeScript Strands SDK and LangGraph handle this, I think we should distinguish between two semantics:

1. `SKIPPED` — skip this node, continue downstream (what this PR does)

This is the "skip and continue" semantic. The node is intentionally bypassed, but downstream dependents should still execute. This is what my bug report asks for.

Design:

Introduce Status.SKIPPED enum value (not COMPLETED — a node that never ran shouldn't be marked as completed)
Dependency resolution treats SKIPPED the same as COMPLETED — the node counts as "satisfied" for downstream readiness checks in _find_newly_ready_nodes / _is_node_ready_for_resume
_build_node_input should skip the header entirely for SKIPPED deps (no empty "From node_x:" with nothing underneath)
reset_on_revisit clears SKIPPED nodes (same as COMPLETED) — they become eligible for execution again on loop re-entry
The NodeResult should not store a RuntimeError — a skipped node isn't an error. Use a lightweight marker (e.g., None result or a dedicated sentinel) so get_agent_results() cleanly returns []

The SKIPPED status is valuable because:

It's semantically honest — a node that never ran isn't "completed"
Edge conditions on outgoing edges can inspect execution_status == SKIPPED vs COMPLETED and route differently (powerful for loop control)
Observability — dashboards/logs can distinguish "ran successfully" from "intentionally bypassed"
No ambiguity with nodes that ran but legitimately returned empty output

2. `CANCELLED` — kill this branch gracefully (align with TS SDK)

This is what the TypeScript SDK does with beforeEvent.cancel: the node is cancelled, marked as a terminus, and downstream nodes do NOT execute. No error propagates — the graph can still complete via other branches.

This could be a separate hook field (e.g., cancel_node for cancel, skip_node for skip) or a single field with enum values ("skip" vs "cancel").

Why both are needed (especially for looping)

Consider a loop: A → B → C → (edge back to A if condition)

SKIPPED: On iteration 2, B is skipped because it already gathered data. C still runs, the back-edge fires, and the loop continues. This is the common case for conditional execution within loops.
CANCELLED: You want to abort a branch entirely — e.g., an optional side-branch that's no longer relevant. The main loop path continues via other branches.

Without the distinction, you can't express "skip but continue" vs "stop this path" — both of which are legitimate control flow decisions.

Complementary mechanism: Edge conditions with `invocation_state`

PR #2305 (merged) added invocation_state to edge condition evaluation, which provides an alternative way to route around nodes at the edge level. The two mechanisms are complementary:

Edge conditions: Control whether to traverse a path (node never enters the execution batch)
skip_node / cancel_node hooks: Control whether to execute a node that's already been reached (node is in the batch but gets bypassed)

Both are needed because edge conditions are evaluated before batch formation, while hooks fire after.

Happy to collaborate on the implementation if this direction makes sense. The core of your PR is right — the change from "raise RuntimeError" to "return cleanly" is exactly the fix. The suggestion is to layer proper status semantics on top.

Zelys-DFKH · 2026-05-30T00:00:17Z

Agreed on SKIPPED vs COMPLETED. Updated; your analysis of the header-leak risk in _build_node_input was spot-on, too.

result=None, Status.SKIPPED throughout. _build_node_input filters SKIPPED deps before building dependency_results, so when all upstream deps are skipped the downstream node gets just the original task with no orphaned "From step_a:" header. _from_dict reads the persisted NodeResult.status to restore Status.SKIPPED correctly on resume. Also added skipped_nodes to GraphResult and adjusted completed_nodes to exclude SKIPPED nodes, which closes the observability gap.

SKIPPED nodes live in completed_nodes rather than a separate set. That satisfies both _find_newly_ready_nodes and _compute_ready_nodes_for_resume without touching either function. reset_on_revisit works correctly for loop re-entry, too: SKIPPED nodes are in completed_nodes, so they get cleared and become eligible for execution again on the next iteration.

edge.should_traverse is still evaluated for outgoing edges of SKIPPED nodes, which preserves the division you described: edge conditions control whether to enter a path before batch formation, hooks control whether to execute after.

CANCELLED (abort-branch semantics) is a separate concern; I'll open an issue.

Zelys-DFKH · 2026-05-30T09:24:19Z

Opened #2401 to track CANCELLED (abort-branch semantics). It also surfaces the cancel_node → skip_node naming question — happy to make that rename in this PR if maintainers prefer Option A before it merges.

yananym · 2026-06-05T20:52:48Z

Do you mind rebasing? I am asking Strands team to review

github-actions · 2026-06-08T15:22:15Z

-            if before_event.cancel_node:
-                cancel_message = (
-                    before_event.cancel_node if isinstance(before_event.cancel_node, str) else "node cancelled by user"
+            skip_value = before_event.skip_node or before_event.cancel_node


Issue: The skip_node or cancel_node short-circuit means if both are set, only skip_node's value is used (since it's checked first with or). This interaction isn't documented and could surprise users who set both fields.

Suggestion: Add a brief note in the BeforeNodeCallEvent docstring about precedence, e.g., "If both skip_node and cancel_node are set, skip_node takes precedence." Alternatively, you could log a warning when both are truthy.

Good catch. Added a note to the skip_node docstring: "Takes precedence over cancel_node when both are truthy."

github-actions · 2026-06-08T15:22:16Z


    total_nodes: int = 0
    completed_nodes: int = 0
+    skipped_nodes: int = 0


Issue: The GraphResult.completed_nodes field previously counted all settled nodes but now excludes skipped nodes. This is a behavioral change to an existing public field that users may be relying on (e.g., if result.completed_nodes == result.total_nodes: ... to check "all done").

Suggestion: Document this breaking change prominently in the PR description. Users who relied on completed_nodes == total_nodes to mean "graph finished all work" would now need completed_nodes + skipped_nodes == total_nodes (excluding failures/interrupts). Consider whether this warrants a note in a changelog or migration guide.

The direction is the opposite — skipped nodes are added to completed_nodes (the skip path calls self.state.completed_nodes.add(node)). Before this PR, cancel_node raised RuntimeError, so skip didn't exist as a concept; nothing assumed completed_nodes excluded skipped nodes. Callers that want to distinguish can check node.execution_status == Status.SKIPPED on individual results.

Correction to my previous reply: GraphResult.completed_nodes (the public int in _build_result) only counts Status.COMPLETED nodes — skipped ones are filtered by the if n.execution_status == Status.COMPLETED guard. The bot's observation about completed_nodes == total_nodes is right for new callers using skip_node: they'd need completed_nodes + skipped_nodes == total_nodes to mean "all work settled." That's what skipped_nodes is for. The backward-compat point holds: cancel_node raised RuntimeError before this PR, so no existing code relied on skip behavior and there's nothing to migrate.

github-actions · 2026-06-08T15:22:18Z

 )
 @pytest.mark.asyncio
-async def test_graph_cancel_node(cancel_node, cancel_message):
+async def test_graph_cancel_node_deprecated(cancel_node, cancel_message):


Issue: The test name says "deprecated" and the docstring says "emits DeprecationWarning", but since the deprecation warning was intentionally removed from the source code (per review feedback from yananym), this test name and docstring are now inaccurate.

Suggestion: If no deprecation warning will be emitted, rename to test_graph_cancel_node_legacy or test_graph_cancel_node_backward_compat and update the docstring to say "cancel_node still works as a backward-compatible alias for skip_node".

The deprecation warning wasn't removed — BeforeNodeCallEvent.__setattr__ still emits it when cancel_node is assigned a truthy value. The test names are accurate.

github-actions · 2026-06-08T15:22:19Z

Assessment: Request Changes

The core fix is sound — replacing RuntimeError propagation with Status.SKIPPED is the right approach, and the dependency filtering/resume logic is well-implemented. However, there's a critical test/source mismatch that will cause CI failures.

Review Categories

Critical: Tests assert pytest.warns(DeprecationWarning, match="cancel_node is deprecated") in 3 test functions, but no deprecation warning is emitted anywhere in the source code. These tests will fail. Either emit the warning or remove the assertion.
API semantics: GraphResult.completed_nodes field is a behavioral breaking change (now excludes skipped nodes) — should be documented.
Naming consistency: cancel_message variable name is a leftover from the old semantics; should be skip_message to match the new API.
Swarm behavior: Status.FAILED on skip is inherited from the old code but deserves a comment explaining the design rationale.

Good work on the _build_node_input filtering, _from_dict status restoration, and thorough test coverage for the new skip_node path.

Closes strands-agents#2240

Replace Status.COMPLETED + RuntimeError with Status.SKIPPED + None for nodes bypassed via cancel_node. A skipped node never ran and should not be marked completed; downstream readiness checks treat SKIPPED the same as COMPLETED by keeping skipped nodes in completed_nodes. - Add Status.SKIPPED enum value - NodeResult.result accepts None; to_dict/from_dict round-trip via {"type": "skipped"}; get_agent_results() returns [] for None - _build_node_input filters SKIPPED deps before building dependency_results, so downstream nodes receive only the original task with no orphaned "From node_x:" header - _from_dict restores Status.SKIPPED on node.execution_status via the persisted NodeResult.status rather than blindly setting COMPLETED - GraphResult gains skipped_nodes counter; completed_nodes now counts only nodes that actually ran - reset_on_revisit correctly clears SKIPPED nodes for loop re-entry (they are already in completed_nodes)

Introduce BeforeNodeCallEvent.skip_node as the preferred API for bypassing a node in graph/swarm hooks. BeforeNodeCallEvent.cancel_node continues to work but now emits a DeprecationWarning at the assignment site via __setattr__, so the warning points to user code rather than framework internals. Introduce MultiAgentNodeSkipEvent (type: multiagent_node_skip) for the skip signal. MultiAgentNodeCancelEvent is kept but documented as reserved for the future abort-branch CANCELLED semantic per strands-agents#2401. Behaviour: - Graph: skip-and-continue — node is settled as SKIPPED, downstream nodes proceed normally. - Swarm: skip-and-abort — existing abort behaviour is unchanged; swarm stops the current run. - Warning fires unconditionally when cancel_node is set truthy, even when skip_node is also set. - Falsy values (False, "") mean "do not skip" on both fields.

…ify semantics The variable name cancel_message was inherited from the old cancel_node API. With skip_node as the primary interface, rename to skip_message throughout. Also add docstrings to GraphResult.completed_nodes and skipped_nodes to document that completed_nodes counts only nodes that ran to completion (not skipped nodes), and add an inline comment explaining why a skipped node in a linear swarm sets Status.FAILED rather than Status.SKIPPED.

…ip semantics Shorten overlong inline comment in swarm.py from 224 to 119 chars (ruff E501). Replace the stale RST deprecated directive on MultiAgentNodeCancelEvent with a plain note that the class is not currently emitted (reserved for issue strands-agents#2401). Add cancel_node alias mention to MultiAgentNodeSkipEvent docstring. Update test_cancel.py: swap multiagent_node_cancel to multiagent_node_skip, remove the RuntimeError expectation from the graph test (graph continues after skip rather than raising), and expect Status.COMPLETED for the graph case.

…allEvent

Zelys-DFKH had a problem deploying to manual-approval May 6, 2026 22:21 — with GitHub Actions Failure

github-actions Bot added the size/s label May 6, 2026

Zelys-DFKH had a problem deploying to manual-approval May 6, 2026 22:21 — with GitHub Actions Failure

Zelys-DFKH mentioned this pull request May 8, 2026

[BUG] cancel_node in BeforeNodeCallEvent raises RuntimeError that kills the entire graph on resume #2240

Open

3 tasks

yonib05 added area-multiagent Multi-agent related area-hooks Features or requests that might be implementable via hooks labels May 27, 2026

yonib05 added python Pull requests that update python code bug Something isn't working labels May 29, 2026

yananym mentioned this pull request May 29, 2026

[FEATURE] Support graph cycles (looping back to previous nodes via edge conditions) #2387

Open

github-actions Bot added size/m and removed size/s labels May 30, 2026

Zelys-DFKH requested a deployment to manual-approval May 30, 2026 00:01 — with GitHub Actions Waiting

Zelys-DFKH mentioned this pull request May 30, 2026

[FEATURE] Status.CANCELLED: abort a branch without running downstream or propagating errors #2401

Closed

Zelys-DFKH changed the title ~~fix(graph): treat cancel_node as control flow, not fatal error~~ fix(graph): introduce Status.SKIPPED for cancel_node; graph continues downstream May 30, 2026

yananym reviewed Jun 5, 2026

View reviewed changes

Comment thread strands-py/src/strands/hooks/events.py

Zelys-DFKH force-pushed the fix/graph-cancel-node-graceful-skip branch from 14642dc to 593da1d Compare June 5, 2026 20:26

github-actions Bot added size/m and removed size/m labels Jun 5, 2026

Zelys-DFKH requested a deployment to manual-approval June 5, 2026 20:30 — with GitHub Actions Waiting

github-actions Bot added size/m and removed size/m labels Jun 5, 2026

JackYPCOnline self-assigned this Jun 5, 2026

Zelys-DFKH force-pushed the fix/graph-cancel-node-graceful-skip branch from 333dc08 to 504027b Compare June 5, 2026 21:01

github-actions Bot added size/m and removed size/m labels Jun 5, 2026

Zelys-DFKH requested a deployment to manual-approval June 5, 2026 21:01 — with GitHub Actions Waiting

github-actions Bot reviewed Jun 8, 2026

View reviewed changes

github-actions Bot removed the strands-running label Jun 8, 2026

Zelys-DFKH force-pushed the fix/graph-cancel-node-graceful-skip branch from 06e6cf0 to c162369 Compare June 8, 2026 18:02

github-actions Bot added size/m and removed size/m labels Jun 8, 2026

Zelys-DFKH requested a deployment to manual-approval June 8, 2026 18:02 — with GitHub Actions Waiting

github-actions Bot added size/m and removed size/m labels Jun 8, 2026

Zelys-DFKH requested a deployment to manual-approval June 8, 2026 19:09 — with GitHub Actions Waiting

github-actions Bot added size/m and removed size/m labels Jun 8, 2026

Zelys-DFKH requested a deployment to manual-approval June 8, 2026 21:34 — with GitHub Actions Waiting

Zelys-DFKH added 7 commits June 10, 2026 18:00

fix(graph): treat cancel_node as control flow, not fatal error

5eda187

Closes strands-agents#2240

docs(graph): update stale cancel_node reference in GraphState docstring

27f957b

docs(hooks): document skip_node/cancel_node precedence in BeforeNodeC…

9198f8e

…allEvent

Zelys-DFKH force-pushed the fix/graph-cancel-node-graceful-skip branch from 1ab0a04 to 9198f8e Compare June 10, 2026 23:01

github-actions Bot added size/m and removed size/m labels Jun 10, 2026

Zelys-DFKH requested a deployment to manual-approval June 10, 2026 23:01 — with GitHub Actions Waiting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(graph): introduce Status.SKIPPED for cancel_node; graph continues downstream#2258

fix(graph): introduce Status.SKIPPED for cancel_node; graph continues downstream#2258
Zelys-DFKH wants to merge 7 commits into
strands-agents:mainfrom
Zelys-DFKH:fix/graph-cancel-node-graceful-skip

Zelys-DFKH commented May 6, 2026 •

edited

Loading

Uh oh!

yananym commented May 29, 2026

Uh oh!

Zelys-DFKH commented May 30, 2026

Uh oh!

Zelys-DFKH commented May 30, 2026

Uh oh!

Uh oh!

yananym commented Jun 5, 2026

Uh oh!

github-actions Bot Jun 8, 2026

Uh oh!

Zelys-DFKH Jun 8, 2026

Uh oh!

github-actions Bot Jun 8, 2026

Uh oh!

Zelys-DFKH Jun 8, 2026

Uh oh!

Zelys-DFKH Jun 8, 2026

Uh oh!

github-actions Bot Jun 8, 2026

Uh oh!

Zelys-DFKH Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Zelys-DFKH commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

yananym commented May 29, 2026

Proposal: Introduce two distinct statuses — SKIPPED and CANCELLED

1. SKIPPED — skip this node, continue downstream (what this PR does)

2. CANCELLED — kill this branch gracefully (align with TS SDK)

Why both are needed (especially for looping)

Complementary mechanism: Edge conditions with invocation_state

Uh oh!

Zelys-DFKH commented May 30, 2026

Uh oh!

Zelys-DFKH commented May 30, 2026

Uh oh!

Uh oh!

yananym commented Jun 5, 2026

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Zelys-DFKH Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Zelys-DFKH Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Zelys-DFKH Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Zelys-DFKH Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Zelys-DFKH commented May 6, 2026 •

edited

Loading

Proposal: Introduce two distinct statuses — `SKIPPED` and `CANCELLED`

1. `SKIPPED` — skip this node, continue downstream (what this PR does)

2. `CANCELLED` — kill this branch gracefully (align with TS SDK)

Complementary mechanism: Edge conditions with `invocation_state`